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Due to their “inherent parallelism”, interaction nets have since their introduction been considered 
as an attractive implementation mechanism for functional programming. We show that a simple 
highly-concurrent implementation in Haskell can achieve promising speed-ups on multiple cores. 


1 Introduction 

The interaction nets introduced by Lafont ILaf90l can be considered as a variant of term graphs, and 
therewith as a kind of graphs used as representation of terms. Interaction nets are equipped with an 
“inherently parallel” local and confluent reduction mechanism that makes them an, at least conceptually, 
attractive target for (functional) programming language implementation. However, to date there have 
been only limited experiments with parallel implementations of interaction nets, and no easily-usable 
parallel implementation is publicly available. In addition, the nature of the parallelism of interaction net 
reduction is in general rather fine-grained, so that the question of distribution strategies arises naturally. 

In this paper, we report on an experiment that bypasses the question of distribution strategies, and in¬ 
stead investigates whether a fine-grained threading mechanism with parallel execution on shared-memory 
multi-core systems, as provided by the run-time system of the Glasgow Haskell Compiler (GHC), can 
already realise the potential of parallelisation offered by interaction nets. Our implementation is pub¬ 
licly available (at http://www.cas.mcmaster.ca/~kahl/Haskell/HINet/) and accepts a slightly 
restricted version of the Inets file format, enabling further experiments also by other interaction net re¬ 
searchers. In the benchmarking section, we provide a lot of data, and also discuss the potential pitfalls 
of benchmarking Haskell programs with large heap requirements, in order to aid potential users of our 
system to avoid these pitfalls. 

1.1 From Term Graphs via Jungles and Code Graphs to Interaction Nets 

We now give an introduction to interaction nets that puts them into the context of different term graph 
representations. We do this for two reasons: First, to make interaction nets more accessible for readers 
interested in functional programming language implementation, who may already be familiar with graph 
reduction, but might find the principal-port orientation of most of the interaction net literature rather 
obscure, and second, to give a clear understanding of polarities, which have almost disappeared from the 
interaction net literature. 

Conventional term graphs (see e.g. IIKKSV931I ) are node-labelled directed graphs, where each node 
has a sequence of outgoing edge the length of which is determined (or sometimes part of) the label. 
Node labels of these term graphs correspond to function symbols in terms; variables do not need labels: 
Different variable nodes (labelled “ V” below) represent different variables. 

The “jungle” approach of Hoffmann and Plump IIHP9HI moves the function symbols into hyperedges, 
with a sequence of “argument tentacles” (or “input tentacles”) extending to argument nodes, and (nor¬ 
mally) exactly one “result tentacle” (or “output tentacle”) extending to the hyperedge’s result node (or 
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output node). In both approaches, there is no restriction on the number of edges (resp. input tentacles) 
incoming into each node; multiple incoming edges implement sharing (and zero incoming edges into a 
non-root node implement (uncollected) “garbage”, where in term graph and jungle rewriting, garbage 
collection is typically implicit). 





The drawing above shows a conventional term graph, a jungle, and an interaction net each representing 
the term {2 + x) *x + {2 + x) with the same degree of sharing. In all three drawings, the sequence of 
the outgoing or incoming edges, respectively tentacles, or ports, of each node or hyperedge is part of the 
structure, but is, as customary, not made more explicit. 

Interaction nets are different from jungles in several ways. First of all, a different terminology is 
used: Instead of “hyperedge”, the terms “node” or “agent” are used, the nodes of jungles turn into 
“connections” and the tentacle labels and directions turn into “ports”. In interaction nets, connections 
must be incident with exactly one or two ports; those incident with only one port make up the interface of 
the net. Because of this, sharing and garbage must be made explicit via duplicator (“V”) and terminator 
(“!”) nodes. Each interaction net node label determines one principal port for its nodes. We draw 
principal ports as filled-in circles attached to the rectangular nodes, while auxiliary ports are hollow. 
Interaction net rules only replace pairs of nodes connected via their principal ports. 

The directions of edges in termgraphs, and of tentacles in jungles, are motivated by denotational 
semantics; the corresponding directions of connections in interaction nets were introduced under the 
name polarities by Lafont IILaf901 . but are omitted in a large part of the interaction net literature, where 
interaction nets are drawn with undirected connections. Instead, the operationally motivated direction 
of nodes (“actors”) from auxiliary ports to the principal port is typically emphasised. We follow Lafont 
|Laf90l to distinguish output ports (with positive polarity) and input ports (negative polarity), and draw 
connections as directed arrows from output to input ports. Note that besides Lafont llLaf90l . most of the 
interaction net literature does not draw nets in a way that easily corresponds to a jungle reading. 

Whereas jungle hyperedges have only one output tentacle, the duplicator (V) nodes of the interaction 
net above have two output ports — a feature that also occurs in the code graphs of IIKAC061IAK09I . We 
illustrate this with a second example; the term {2/x) *y + {2%x) *y represented with sharing as a term 
graph has two variable nodes corresponding to x and y; represented as jungle these turn into two input 
nodes. 




Term Graph 


Jungle 
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In code graphs, the sequence of these input nodes is explicitly visualised via triangular tags with arrows 
towards the input nodes; code graphs also have a sequence of output nodes visualised via triangular tags 
with arrows from the output nodes. Code graph hyperedges also have as interface a sequence of input 
nodes (as in jungles) and a sequence of output nodes, which in contrast to jungles is not constrained to 
contain exactly one element. For the sake of an example, we can therefore use a two-output operation 
“divMod” to obtain a code graph that uses a single operation to produce the same result as the two 
separate operations / and % in the term and jungle above. (The sequences of input and output nodes of 
hyperedges are still indicated implicitly via the graphical arrangement.) 



Since code graphs allow multi-output nodes, duplicators (“V”) do not need to be given any special status, 
and interaction net languages can be understood as code graph languages without node-based sharing 
(and without “garbage”), which allows us to replace the code graph nodes with their single incoming and 
outgoing tentacles with simple connections. Input and output nodes of code graphs turn into input and 
output ports of interaction nets — these are the ports of negative, respectively positive polarity that have 
no connection attached to them. As for code graphs, we will assume the input and output ports to be 
organised into two sequences, and tag them using the same triangles. 


1.2 Interaction Net Rules and Reduction 


Application of rules is defined as subnet replacement, where the input and output ports of the rule sides 
may map to arbitrary ports in the application net. Due to the constraints on the left-hand sides of rules, 
the resulting reduction has no critical pairs; it is therefore confluent and has a deterministic normalisation 
relation. Since left-hand sides match only to subnets induced by two nodes connected via their principal 
ports, reduction exhibits extreme locality, and is frequently considered as “inherently parallel”. 

Below, we show rules for addition and multiplication of natural numbers built up from the con¬ 
structors for zero (“0”) and successor (“S”). The first multiplication rule, “mult 0 n = 0”, turns n into 
“garbage” by attaching a terminator (“!”) node; the second multiplication rule “duplicates” n for use both 
by the addition and by the recursive call. 


add 0 n = n 

add (S m) n = S (add m n) 


multO n = 0 

mult (S m) n = add n (mult m n) 



2 
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The reader may notice that the multiplication rules provided above always perform a “superfluous” last 
addition to zero if the first factor is non-zero. One might consider the following starting point instead: 


mult 0 n = 0 
mult (S 0) n = n 

mult (S m) n = add n (mult m n) 





However, the “deep pattern matching” here cannot be implemented directly by conventional interaction 
net reduction; the rules drawn above to the right are however allowed in the extension proposed by 
Hassan et ah IIHJS09II which translates them into conventional interaction net rules by adding an auxiliary 
function: 


mult 0 n = 0 
mult (S m) n 
= multAux m n 
multAux 0 n = n 
multAux (S m) n 

= add n (multAux m n 





Such encoding issues are not relevant to the current study, which considers interaction nets as an exe¬ 
cution model, rather than as a programming language. Compilation to interaction net rules is a separate 
topic, and has been studied for example by H. Cirstea and others ||CFF+07| using the p-calculus as 
intermediate language. 


1.3 Related Work 


Pedicini and Quaglia | |PQ07| describe PELCR, a distributed parallel environment for optimal A-calculus 
reduction, which uses a specialised fixed interaction net language and implements sophisticated distribu¬ 
tion strategies. (I found no trace of this being or having been publicly available.) Besides such specialised 
systems, we are aware of only a small number of parallel implementations of interaction nets, in particu¬ 
lar IIBP97[[Pin011IJirl4l . Of all these, only the last seems to be (still) available; it is an experimental GPU 
implementation that requires new rules to be implemented manually in C/CUDA at a very low level. 

A general interaction net implementation that is still available is part of the Inets project of Mackie 
et al. IIHMS091IHJ12I . This it is a compiler for the interaction net definition language Inets, which is 
considered as a programming language; the compiler is implemented in Java, and compiles via C to 
non-parallel executables. While Inets implements nets as pointer structures, the (apparently unavailable) 
successor system “Light” IlHMSlOI . as well as the systems of Pinto HPinOllI and Jiresch IIJirl41l are based 
on a term representation of interaction nets (based on the fact already pointed out by Lafont IILaf901l that 
“well-behaved” fully reduced nets always can be represented via pairs of terms with common variables 
and further constraints). Lippi’s implementation called “in^” |Lip02| was apparently close in spirit, but 
not directly based on terms. 

Other available implementations are geared more towards graphical interaction directly with interac¬ 
tion nets (and also don’t support parallel execution), including de Lalco’s “Interaction Nets Laboratory” 
lLal06i . the “interaction net IDE” INblobs of Almeida et al. HAPVOSII . and the graph rewriting system 
IDE “PORGY” llALK+llIl which can also be used for interaction nets. By emphasising visualisation of 
net transformations, these tools by design cannot target efficient parallel implementation. 
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1.4 Contribution and Overview 

We present a design for highly concurrent interaction net implementations that is at the same time sur¬ 
prisingly simple and very close to the graph understanding of the interaction net definition. The parallel 
implementation of concurrency in the Glasgow Haskell Compiler (GHC) is a good fit for this kind of 
design; our implementation obtains satisfactory speed-ups even for simple examples. 

While most current non-graphical implementations of interaction nets are based on a term-based 
calculus, we explain our more direct approach in Sect. The actual (literate) Haskell source code 
of the kernel of our implementation is then presented in Sect. — the full source code is available 
on-line at http://www.cas.mcniaster.ca/~kahl/Haskell/HINet/. In Sect. we summarise our 
implementation of a language similar to that of Inets IIHJ121 . Measurements and relevant observations 
are in Sect. |5] 

2 Implementation Design 

Our implementation essentially follows the main ideas of Banach and Papadopoulos IBP97I : 

• Two-way connections, which easily introduce opportunities for deadlock and race conditions, can 
be avoided by using polarities to direct the connections between ports (which, in a large part of the 
literature, are treated as undirected, and implemented as two-way connections). 

• These directed connections hold mutable state. 

• The connection with the principal port of a constructor does not need to be known to the constructor 
node if the connection state refers to the node. 

The following main decisions then determine most of our implementation details: 

• Connections (drawn below as thick circles) are initially “empty”, and each node has references to 
the connections attached to its auxiliary ports. 

• Attaching the principal port of a constructor to a connection deposits a reference to the constructor 
node in the connection (which is then “full”). (This reference is drawn below with a thick arrow 
with a bullet tail.) 

• Attaching the principal port of a function to a connection starts a concurrent thread that waits for 
a constructor reference in that connection, and if/when it finds one, starts the corresponding rule 
application. (This is drawn below with an even thicker arrow ending inside the connection.) 

The following shows a net fragment first in the same style as the previous example, and to the right with 
implementation details added. 









38 


A Simple Parallel Implementation of Interaction Nets in Haskell 


3 Implementation in Concurrent Haskell 


We implement connections using the Concurrent Haskell synchronisation primitive MVar, which can 
be created empty; putMVar waits for empty state to fill, and takeMVar waits for full state to empty 
1PJGF96I . The GHC version of Concurrent Haskell has an extremely light-weight thread implementation 
that makes it feasible to create millions of threads; we therefore directly create new threads for functions 
as mentioned above, and even smaller threads for short-circuiting two interface ports that are directly 
connected by rule applications: These threads only wait for a constructor on the originally negative port 
of the LHS, and copy it to the positive side. 


The run-time implementation of nets, based on MVars, is introduced in Sect. 3.2 For the static rep¬ 
resentation of rules, our implementation uses a non mutable datatype NetDescription to represent right- 
hand sides (RHSs) of reduction rules; these are introduced in Sect. |3.3| At run-time, these NetDescriptions 
are instantiated into new parts of the mutable run-time net, as fully defined in Sect. |3 .4| following the prin¬ 
ciples outlined in Sect. 


3.1 Polarity 

Lafont llLaf90ll and Banach and Papadopoulos I1BP971 use typed connections in their interaction nets, 
where the two ports incident in a connection have the same type, but different polarity. Since we design 
our interaction net implementation as a run-time system, types are currently not important, and will be 
assumed to have been taken care of before net generation. Polarity, however, drives several run-time 
decisions; for the sake of readability, we define a special-purpose dafa-type for it (and let Haskell’s 
“deriving” mechanism provide us with the default implementation of equality and ordering tests, and of 
conversion to strings): 

data Polarity = Neg | Pos 
deriving (Eq,Ord,Show) 

opposite;: PolarityPolarity 
opposite Neg = Pos 
opposite Pos = Neg 

We will follow Lafont’s convention of letting “constructors” have positive polarity, and “functions” neg¬ 
ative polarity. 


3.2 Mutable Net Representation 

A connection between two ports is implemented as a single MVar that is either empty, or contains the 
constructor node for which the connection is at the principal port. (To allow different node label types to 
be used, we use the type variable nLab throughout.) 

type Conn nLab = MVar (Node nLab) 

For an auxiliary port of a node, besides its connection we also record the port’s polarity to make it 
available efficiently at run-time. (In Haskell, data constructors for simple record types habitually are 
given the same name as the type constructor; the fields pol and conn here are declared stricf using “!”, 
and fhe “UNPACK” pragma declares an “unpacking” optimisation as desired to the compiler.) 

data Port nLab = Port 
{pol :: 


! Polarity 









Wolfram Kahl 


39 


,conn:: {-# UNPACK #-} ! (Conn nLab) 

} 

We introduce the type synonym Ports to abbreviate the type of port arrays, 
type Ports nLab = Vector (Port nLab) 

Given a port p, the port at the other end of its connection is obtained as opPort p by flipping the polarity: 

opPort:: Port nLab —> Port nLab 
opPort p = p { pol = opposite$ pol p} 

A node contains a label, and the array of its non-principal ports. We do not include the principal port in 

ports since 

• the principal port of a constructor is connected to the MVar pointing back to the constructor, and 

• the principal port of a function is connected to the MVar the function’s thread is waiting on. 

data Node nLab = Node 
{label:: nLab 
,ports:: Ports nLab 
} 


3.3 Net Descriptions 

Whereas in Sect. |3.2[ we introduced types for nets considered as run-time states, here we introduce net 
description for static representation of, in particular, rule right-hand sides. 

The following types are dictated by our current choice of array implementation (Data.Vector from 
the vector package, for efficiency), but aliased for readability: 

type PI = Int — “port index” 
type Nl = Int — “node index” 

The port index type PI will be used also in actual nets, while the node index type Nl is needed only for 
right-hand side nodes in descriptions and during creation. We arbitrarily call the two nodes engaged in 
an interaction “source” and “target”; the “source” interface consists of the auxiliary ports of the node 
with the “function” label with negative principal port, and the “target” interface consists of the auxiliary 
ports of the “constructor” node with positive principal port. The following data type serves to identify 
all ports in a rule’s right-hand side (the “!” specifies strict constructor argument positions for efficiency): 

data PortTargetDescription 
= SourcePort ! PI 

I InternalPort! Nl! PI — node, port 
I TargetPort ! PI deriving (Eq, Ord, Show) 

Therefore, each RHS node is described by its label and by the connections of all its ports: 

data NodeDescription nLab = NodeDescription 
{nLab :: InLab 

,portDescriptions:: {-# UNPACK#-} ! (Vector PortTargetDescription) 

} 
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A NetDescription is intended as description of the RHS of interaction rules: 

data NetDescription nLab= NetDescription 

{source:: {-# UNPACK #-} ! (Vector PortTargetDescription) 

,target :: {-# UNPACK#-} ! (Vector PortTargetDescription) 

,nodes :: {-#UNPACK#-} ! (Vector (NodeDescription nLab)) 

} 

A language for interaction nets consists of a type of node labels together with arity and polarity informa¬ 
tion defining all ports for each node label, and for any “function” node label f and any “constructor” node 
label c that can occur as “argument” to f a rule, specified by a right-hand side ruleRHS f c, which needs 
to be a net description having a source compatible with the auxiliary ports of f, and a target compatible 
with the auxiliary ports of c. 

data INetLang nLab = INetLang { polarity :: !(nLab —> Vector Polarity) 

,ruleRHS:: !(nLab —> nLab —> NetDescription nLab) 

} 


3.4 Interaction Net Reduction 


The main purpose of the function replaceNet is to implement the instantiation part of the rule application 
step. It is a separate function because it also serves the secondary purpose of constructing the start net. 

The function replaceNet takes as arguments a NetDescription (defined in Sect. 3.31 for the rule’s 
RHS, and arrays src and trg containing the non-principal connections of the two nodes of the image of 
rule’s LHS in the mutable net representation (Sect. [3)^ of the run-time state. 

The mdo is a “recursive do” as introduced by IIEL02L and the use here essentially corresponds to the 
imperative programming pattern of allocating an array of uninitialised cells, and creating references to 
the array cells possibly before initialising them. (Functions prefix with “V.” operate on Vectors.) 


replaceNet:: fora 11 nLabo INetLang nLab —> NetDescription nLab 
^ Ports nLab —^ Ports nLab ^ 10 () 
replaceNet lang descr src trg = mdo 

nps ^ let mkNode (NodeDescription lab pds) = do 

ps ^ V.zipWithM mkPort (polarity lang lab) pds 
return (Node {label = lab, ports = V.tail ps} 

,V.head ps 

) 

where mkPort Pos (InternalPort_) = fmap (Port Pos) newEmptyMVar 

mkPort _ ptd = return (portTarget ptd) 

in V.mapM mkNode (nodes descr) 

The first step above creates descr image nodes, taking over interface ports from src and trg, creating 
new internal connections at positive ports, and lazily connecting negative ports with internal connections 
located via the function portTarget defined below. 

Note that the prose explanations here are interspersed within the scope of the mdo above, since all 
code before the definition of reduce below remains indented below the mdo. 


let portTarget:: PortTargetDescription —> Port nLab 

portTarget (SourcePort i) = atErr "portTarget: SourcePort S" src (pred i) 
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portTarget (TargetPort i) = atErr "portTarget; TargetPort S" trg (pred i) 
portTarget (InternalPort n i) = let e = "portTarget : InternalPort " 

(n', pp) = atErr e nps n 

in opPort (if i = 0 then pp else atErr (e -H- shows n " S") (ports n') (pred i)) 

We traverse the newly created nodes and “connect” their principal ports. 

let doNode (n@(Node lab prts), Port pi c) = case pi of 

Neg —!• forklO (reduce lang (ruleRHS lang lab) c prts) ^ return () 

Pos putMVar c n 
in V.mapM_ doNode nps 

For source and target ports, we only need to take care of short-circuits: 

let dolfacePort (Port Pos c) ptd = return () — will be done from the other side if necessary 

dolfacePort (Port Neg c) ptd = let — original port of the LHS node 

Port _pl' d = portTarget ptd — connecting port in image of RHS 

in if c = d then return () — empty cycle 

else case ptd of 

InternalPort n i' —> return () — already dealt with 

_ —> do forklO (moveMVar c c') 

return () 

in do V.zipWithM_ dolfacePort src$ source descr 
V.zipWithM_ dolfacePort trg $ target descr 

Whenever a function node is created, i.e., a node with positive principal port, a reduce thread is started 
(via forklO). This thread waits on the connection (pconn) between the principal ports of the rule until this 
contains the constructor node (the principal port of which has positive polarity). The array src contains 
the auxiliary ports of the function node (the principal port of which has negative polarity). 

reduce:: INetLang nLab (nLab NetDescription nLab) Conn nLab —^ Ports nLab —^ 10 () 
reduce lang rules pconn src = do 
Node dab trg ^ takeMVar pconn 
replaceNet lang (rules dab) src trg 


4 Reading . inet Files 

The Inets project led by Ian Mackie has implemented the only publicly available general implementation 
of interaction nets, the compiler IIHJ12II for the interaction net programming language “Inets”. This 
language was introduced by Mackie IlMacOSI . with the core of the Inets implementation described in 
IIHMS09L 

We implemented a front-end to our interaction net reduction system for the core sublanguage of 
Inets, leaving out in particular the extension of nested pattern matching described in HHMSIOI . and 
generic rules and variadic agents. 

Since our system depends on polarity for its directed implementation of connections, but Inets has no 
concept of polarity, we adopted the convention that the first-mentioned agent of each rule has negative 
principal port (that is, is considered as a function), and the second agent has positive principal port 
(constructor). This convention is adopted in most of the Inets examples anyways; only two rules in 
fibonacci.inet had been written the other way around. From this starting point we attempt to deduce 
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the polarities of all other ports; for the examples accessible to us so far, we only needed to add a single 
additional heuristic: A function for which all other ports except one are known to have negative polarity 
is assumed to have positive polarity on the last port. (Unfortunately the A-calculus evaluator yale.inet 
lMac98l is defined in a way that does not allow a consistent assignment of polarities.) 

Inets supports “parameters”, that is, agent attributes of the primitive types int, bool, float, char, 
and String. The description in IIHMS09I suggests that only a single parameter is allowed per agent; our 
implementation allows arbitrary numbers, but expects the number and types of attributes to be determined 
by the agent label. We also interpret type int as Haskell’s arbitrary-precision Integer type. Our current 
interpreting implementation uses a parameterised agent label type: 

data NLab arg = NLab { nLabName:: Name, nLabAttrs:: [arg]} 

When reading a .inet tile, the nets on the rule RHSs are translated into NetDescription (NLab Expression) 
and stored in a finite map for lookup by the rule LHS agent label pair; in the run-time net, agent labels of 
type N La b Va I ue are used, and the variable bindings induced by the attributes of the interacting nodes are 
used at the time of rule application to evaluate the expressions in the RHSs (and the condition expressions 
for the conditional structure of Inets RHSs). 

Inets modules can contain global variables, which are used in the examples to implement reduction 
counts etc.; since in a parallel implementation such global variables would require synchronisation (and 
thus would destroy the independence of parallel reduction), we did not implement any feature related to 
global variables. 


5 Benchmarks 


For our first examples, we use a cascading recursion for calculating Fibonacci numbers, and the Ack- 
ermann function, both computing with unary natural numbers constructed from zero Z and the unary 
successor constructor S: 


fib0 = 0 

fib (S n) = fibAux n 
fibAux 0=1 

fibAux (S n) = fib n -|- fibAux n 


ack 0 n = S n 

ack (S m) n = ackAux m n 

ackAux m 0 = ack m 1 

ackAux m (S n) = ack m (ack (S m) n) 


These rules were directly encoded using NetDescriptions (see Sect. 3.3); we will refer to these imple¬ 
mentations now as fibND and ackND. 

We timed the actual code of Sect, [^on a six-core 2.8GHz Phenom 2 with 16GB main memory; 
our implementation achieved the timings in Table [TJ where the GHC run-time system is instructed by 

Nk” to use k cores for parallel processing. The user-space time of a Haskell process is divided into 
“mutation” time and garbage collection time. The run-time system can be made to report these times 
and further information; in Tables and we include, after the elapsed time for each process (which is 
the “real” time as reported by “time” BASH built-in), the “allocation rate”, which measures how many 
megabytes are allocated on the Haskell heap per second of mutation time, and the “productivity”, which 
is the result of dividing the mutation time by the elapsed time. For example, a productivity of 240% for 
a three-core (“-N3”) run means that each core spent on average 20% of its time on garbage collection, 
since 240% -|- 3 x 20% = 300%. The last column in each of the groups for “-N2” to “-N6” contains the 
speedup over single-core execution. 

By default, the GHC run-time system starts execution with a small heap and grows it by relatively 
small increments on demand; we indicate use of this this default setting by “dft.” in the third column 
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(“heap”). Where a size is speeified in this eolumn, this size was given to the run-time system as fixed 
heaps size (with options -H and -M). 


expr. 


result 


heap 



N1 

time 

( 

s) 1 

allocation rai 
-N2 1 

te 

; (MB per mutation secoi 
1 -N3 1 

i( 

j) 1 

productivity 
-N4 1 


(% of elapsed) 

1 -N5 


speedup 

-N6 


ackND 3 6 


509 


dft. 


1.078 

1699 

48 


0.630 

1189 

118 

1.71 


0.483 

952 

192 

2.23 


0.431 

779 

264 

2.50 


0.427 

611 

340 

2.52 


0.426 

506 

411 

0 

ackND 3 6 


509 


2M 


1.004 

1716 

51 


0.633 

1188 

118 

1.59 


0.489 

960 

189 

2.05 


0.452 

738 

266 

2.22 


0.411 

647 

334 

2.44 


0.421 

516 

409 

2.38 

ackND 3 6 


509 


3M 


0.800 

1693 

65 


0.568 

1196 

130 

1.41 


0.478 

960 

193 

1.67 


0.445 

741 

269 

1.80 


0.409 

652 

333 

1.96 


0.429 

509 

405 

1.86 

ackND 3 6 


509 


4M 


0.694 

1678 

76 


0.504 

1202 

146 

1.38 


0.444 

942 

212 

1.56 


0.433 

747 

274 

1.60 


0.405 

646 

339 

1.71 


0.425 

512 

407 

1.63 

ackND 3 6 


509 


5M 


0.652 

1652 

82 


0.475 

1194 

156 

1.37 


0.415 

948 

225 

1.57 


0.404 

751 

293 

1.61 


0.385 

652 

354 

1.69 


0.422 

510 

413 

1.55 

ackND 3 6 


509 


6M 


0.647 

1604 

85.3 


0.462 

1181 

163 

1.40 


0.400 

945 

235 

1.62 


0.387 

745 

308 

1.67 


0.375 

647 

366 

1.73 


0.395 

522 

431 

1.64 

ackND 3 6 


509 


7M 


0.644 

1575 

87 


0.459 

1159 

167 

1.40 


0.389 

943 

242 

1.66 


0.382 

739 

315 

1.69 


0.361 

652 

377 

1.78 


0.395 

521 

431 

1.63 

ackND 3 6 


509 


8M 


0.659 

1521 

88 


0.472 

1108 

170 

1.40 


0.396 

918 

244 

1.66 


0.384 

727 

319 

1.72 


0.363 

640 

383 

1.81 


0.388 

511 

447 

1.69 

ackND 3 6 


509 


9M 


0.675 

1469 

89 


0.482 

1070 

172 

1.40 


0.411 

873 

248 

1.64 


0.393 

705 

321 

1.72 


0.374 

615 

386 

1.80 


0.392 

501 

452 

1.72 

ackND 3 6 


509 


lOM 


0.686 

1437 

90 


0.485 

1061 

172 

1.41 


0.420 

846 

250 

1.63 


0.404 

678 

324 

1.70 


0.379 

600 

391 

1.81 


0.399 

489 

456 

1.72 

ackND 3 6 


509 


O.IG 


0.749 

1305 

92 


0.522 

982 

175 

1.43 


0.445 

796 

253 

1.68 


0.430 

635 

329 

1.74 


0.416 

544 

397 

1.80 


0.435 

452 

458 

1.72 

ackND 3 7 


1021 


dft. 


5.866 

1676 

36 


3.177 

1185 

94 

1.85 


2.287 

982 

158 

2.56 


1.990 

802 

223 

2.95 


1.845 

642 

300 

3.18 


1.771 

547 

367 

s 

ackND 3 7 


1021 


6M 


3.335 

1585 

67 


2.288 

1181 

131 

1.46 


1.877 

979 

193 

1.78 


1.815 

764 

256 

1.84 


1.661 

681 

314 

2.01 


1.728 

542 

380 

1.93 

ackND 3 7 


1021 


8M 


3.024 

1514 

78 


2.115 

1133 

148 

1.43 


1.723 

953 

217 

1.76 


1.666 

759 

281 

1.82 


1.521 

683 

342 

1.99 


1.591 

551 

406 

1.90 

ackND 3 7 


1021 


9M 


3.000 

1474 

80 


2.076 

1122 

152 

1.45 


1.715 

930 

223 

1.75 


1.651 

743 

290 

1.82 


1.496 

670 

355 

2.01 


1.545 

547 

422 

1.94 

ackND 3 7 


1021 


lOM 


3.010 

1438 

82 


2.107 

1078 

156 

1.43 


1.735 

902 

227 

1.74 


1.638 

729 

298 

1.84 


1.520 

649 

361 

1.98 


1.579 

531 

424 

1.90 

ackND 3 7 


1021 


20M 


2.932 

1377 

88 


2.022 

1035 

170 

1.45 


1.712 

848 

245 

1.71 


1.637 

682 

319 

1.79 


1.539 

592 

390 

1.91 


1.579 

490 

460 

1.86 

ackND 3 7 


1021 


IG 


3.508 

1187 

91 


2.472 

881 

171 

1.41 


2.085 

727 

246 

1.68 


1.971 

590 

319 

1.78 


1.835 

527 

384 

1.91 


1.840 

448 

449 

1.90 

ackND 3 8 


2045 


dft. 


30.034 

1557 

30 


16.857 

1138 

74 

1.78 


11.412 

956 

131 

2.63 


9.640 

791 

187 

3.12 


8.597 

646 

256 

3.49 


8.061 

550 

322 

3.73 

ackND 3 8 


2045 


lOM 


17.000 

1423 

59 


11.116 

1065 

120 

1.53 


8.802 

899 

180 

1.93 


8.195 

727 

239 

2.07 


7.320 

657 

296 

2.32 


7.306 

540 

361 

2.33 

ackND 3 8 


2045 


40M 


13.089 

1248 

87 


9.171 

929 

167 

1.43 


7.450 

789 

243 

1.76 


7.079 

633 

318 

1.85 


6.503 

564 

389 

2.01 


6.539 

473 

461 

2.00 

ackND 3 8 


2045 


60M 


13.057 

1228 

89 


9.019 

923 

171 

1.45 


7.373 

778 

248 

1.77 


6.929 

634 

324 

1.88 


6.372 

565 

396 

2.05 


6.375 

475 

471 

2.05 

ackND 3 8 


2045 


SOM 


13.110 

1212 

90 


8.989 

920 

173 

1.46 


7.292 

781 

250 

1.80 


6.904 

630 

328 

1.90 


6.353 

562 

399 

2.06 


6.364 

478 

469 

2.06 

ackND 3 8 


2045 


lOOM 


13.043 

1215 

90 


9.042 

913 

173 

1.44 


7.345 

772 

251 

1.76 


6.917 

628 

328 

1.88 


6.372 

559 

400 

2.05 


6.376 

475 

470 

2.05 

ackND 3 8 


2045 


IG 


13.849 

1154 

90 


9.588 

869 

173 

1.44 


7.824 

737 

250 

1.77 


7.288 

606 

326 

1.90 


6.665 

547 

395 

2.08 


6.627 

472 

460 

2.09 

ackND 3 8 


2045 


8G 


15.521 

1043 

91 


11.200 

819 

169 

1.39 


9.204 

703 

239 

1.69 


8.686 

573 

309 

1.79 


7.947 

523 

370 

1.95 


7.822 

453 

432 

1.98 

ackND 3 9 


4093 


dft. 


141.662 

1415 

28 


85.999 

1041 

64 

1.65 


60.032 

904 

105 

2.36 


49.941 

755 

151 

2.84 


42.996 

625 

212 

3.29 


38.546 

547 

271 

3.68 

ackND 3 9 


4093 


8G 


62.920 

1016 

91 


41.032 

815 

174 

1.53 


32.717 

708 

251 

1.92 


30.653 

577 

328 

2.05 


27.727 

526 

398 

2.27 



^7.087| 

459 

466 

2.32 

ackND 3 10 


8189 


8G 


300.687 

837 

91 


184.245 

716 

174 

1.63 


141.858 

643 

252 

2.12 


128.381 

546 

328 

2.34 


115.164 

501 

398 

2.61 


110.754 

447 

464 

2.71 



































libND 20 


6765 


IG 


0.513 

667 

94 


0.339 

558 

168 

1.51 


0.283 

444 

232 

1.81 


0.251 

440 

294 

2.04 


0.232 

404 

338 

2.21 


0.225 

358 

403 

2.28 

libND 25 


75025 


4G 


6.651 

621 

85 


4.354 

527 

153 

1.53 


3.437 

410 

212 

1.93 


3.097 

410 

276 

2.15 


2.883 

370 

327 

2.31 


2.634 

332 

387 

2.53 

libND 28 


317811 


8G 


32.24 

732 

64 


21.92 

619 

112 

1.47 


16.34 

544 

172 

1.97 


14.85 

454 

226 

2.17 


13.41 

412 

278 

2.40 


12.85 

375 

317 

2.51 

libND 30 


832040 


8G 


139.617 

762 

38 


101.619 

647 

62 

1.37 


62.626 

557 

116 

2.23 


56.072 

438 

158 

2.49 


49.494 

423 

193 

2.82 


44.478 

371 

247 

3.14 


Table 1: Benehmarks for direetly-programmed NetDescriptions 

In general, as long as the heap is small in eomparison with the spaee requirements of the eurrent 
run, the run-time system spends a mueh higher part of its time performing garbage eolleetion — this 
manifests itself in low “produetivity” entries in the tables below in the rows with small fixed heap sizes 
and wifh “dft.”. (The amounf of spaee fhaf is alloeafed on fhe heap by any given fask varies only 
minimally wifh differenl heap and parallelism sellings.) Nol limiling fhe heap size (wifh (“-Msize”) on 
longer-running lasks may lead fhe run-lime system lo use a heap lhal is larger lhan Ihe available physieal 
memory, leading lo draslie performanee loss dues lo swapping of memory pages lo peripheral storage. 
For lasks lhal aelually do use large heap spaee, nol fixing Ihe slarl heap size (wilh {“-fisize”) lels Ihe 
run-lime system adopl Ihe defaull behaviour al Ihe slarl of Ihe program, leading to slow-down of aelually 
acquiring Ihe needed large heap. Therefore, oplimal lime is typically oblained using a fixed heap size, 
lhal is, wilh bolh -H and -M sel to Ihe same size, which is whal we adopted for our benchmarking. 




44 


A Simple Parallel Implementation of Interaction Nets in Haskell 


(The GHC run-time system also provides finer control over the initial heap size, and over the size of the 
increments; we did not experiment with these here.) 

Over its whole run-time, ackND 3 6 allocates 880MB on the heap, and ackND 3 7 allocates 3.5GB. If 
such small tasks are given large heaps, this leads to significant slow-down. As can be seen for ackND 3 8, 
which allocates 14GB, giving larger processes a generous fixed heap produces a performance fhaf is 
closer fo fhe optimum fhan using fhe defaull sellings. 

On an 8-core 16-hyperlhread 2.4GHz Xeon 8870, each of fhe examples we fried so far has a maxi¬ 
mum number of cores beyond which adding cores slows down reduction, see Table]^ This is an example 


expr. 


-NI 

-N2 

-N5 

time 

-N8 

(s) 

-N9 

-NIO 

-Nil 

-N12 


-N2 

-N5 

speed 

-N8 

up factor 
-N9 

over -Nl 
-NIO 

-Nil 

-N12 

fib 28 


63.581 

40.173 

22.495 

19.389 

16.572 

17.640 

16.618 

17.234 


1.58 

2.83 

3.28 

3.84 

3.60 

3.83 

3.69 

fib 30 


223.291 



68.377 

63.488 

58.204 

60.160 

62.559 




3.27 

3.52 

3.84 

3.71 

3.57 

ack3 7 


5.900 

4.177 

3.234 

3.889 

3.786 

4.042 

4.033 

4.170 


1.41 

1.82 

1.52 

1.56 

1.46 

1.46 

1.41 


Table 2: 16-core Benchmarks for direcfly-programmed NetDescriptions 


of fhe effecl of diminishing gains of adding processors lo a parallel workload lhal does nol splil info a 
sufficienf number of sufficienfly large independenf pieces: The overhead of synchronisation in such a 
conlexl makes if unfeasible lo profil from fhe compuling power of added cores beyond a lask-dependenl 
Ihreshold. 

Table conlains timings for running our Runinets inferprefer on a collection of Inets programs 
mosfly derived from programs in iHJ12ll by replacing fhe main nefs wilh larger examples. The Iasi Iwo 
columns conlain limings for running fhe compiled programs using fhe Inets compiler of IIHJ12II . and fhe 
quolienl of our “-Nl” time wilh Ihis run-time. 

Ackerman.inet from IIHJ12II uses a (lolalised) predecessor function; Ack.inet is a direcl Iranslafion 
of fhe rules in ackND. The counls reported by fhe Inets implementation indicate that Ackerman.inet 
requires almost exactly 1.5 times the number of rule applications of Ack.inet; Inets-compiled executables 
and our Runinets take roughly 1.6 times the time. 

fib.inet is a direct translation of our fibND implementation into Inets, and works, like both Ackerman 
functions, on unary natural numbers constructed from S and Z. We found that fib.inet performs roughly 
20% more allocation than fibND, which will be due to the overhead of transforming an Expression- 
based NetDescription into Value-based for each rule application (even though there are no expressions 
to evaluate in this example that does not use attributes). However, it appears that the difference in run 
times is, as for Ack.inet versus ackND, much less — this should be due to the fact that the overhead is 
not slowed down by concurrency synchronisation. 

fibonacci.inet from IIHJ121I carries arguments and results in node attributes, and uses implementation- 
provided addition of integer attributes instead of recursing over predecessors like fibND. It therefore has 
significantly less work to do than fibND. 

sort.inet from Inets is an implementation of bubble sort on lists; it uses an int-valued agent attribute to 
carry the list elements, so element comparisons are performed as part of choosing the RHS of conditional 
rules. The counter results of the Inets runs show that this performs exactly {n/2 + \)-{n + l) interactions 
for a randomly generated start list with even length n. This pattern fits some of the Runinets times in 
Tableexactly, while other Runinets times appear to exhibit a worse asymptotic behaviour; I suggest 
that this is due to the fact that I used the same heap sizes for different sort argument sizes instead of 
trying to identify respective optimal heap sizes. 

On the whole, on a single core, Runinets typically takes about 10 to 20 times the time of the Inets- 
compiled executables, which is to be expected for an interpreted implementation. 
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expr. 

result 

heap 

-N1 

time ( 

-N2 

s), allocation rate (MB/M 

-N3 

UT-s), productivity {% of e 

-N4 

lapsed), speedup 

-N5 

-N6 


Inei 

time 

speedup 
over -N 1 — 

Ackerman 3 6 

Ackerman 3 6 

Ackerman 3 6 

509 

509 

509 

dft. 

20M 

8G 

2.150 

1.204 

2.163 

2100 

1597 

1083 

38 

89 

88 

1.191 

0.861 

1.536 

1440 

1163 

850 

100 

171 

156 

1.80 

1.40 

1.41 

0.850 

0.744 

1.323 

1214 

920 

712 

166 

251 

216 

2.53 

1.62 

1.63 

0.770 

0.711 

1.236 

932 

745 

608 

238 

323 

268 

2.79 

1.69 

1.75 

0.722 

758 

618 

528 

312 

396 

318 

2.98 

0.716 

0.740 

1.209 

624 

497 

437 

382 

466 

374 

3.00 


0.125 

17.2 

9.63 

17.3 

0.700 1 

1.72 

1.63 

1.79 

1.189 

1.82 

Ackerman 3 7 

Ackerman 3 7 

Ackerman 3 7 

1021 

1021 

1021 

dft. 

40M 

8G 

11.432 

5.211 

8.359 

2022 

1498 

1034 

30 

88 

92 

6.251 

3.601 

5.678 

1415 

1127 

827 

77 

169 

169 

1.83 

1.48 

1.47 

4.158 

3.036 

4.741 

1226 

915 

701 

134 

247 

238 

1.75 

1.72 

1.76 

3.668 

2.799 

4.291 

936 

759 

609 

200 

322 

302 

3.12 

1.86 

1.95 

3.249 

779 

632 

535 

271 

396 

362 

3.52 

3.080 

2.843 

4.023 

655 

516 

457 

339 

467 

423 

3.71 


0.511 

22.37 

10.2 

16.4 

2.738 1 

1 1 

1.83 

4.051 

2.06 

2.08 

Ackerman 3 8 

Ackerman 3 8 

Ackerman 3 8 

2045 

2045 

2045 

dft. 

60M 

8G 

55.740 

22.716 

25.963 

1899 

1408 

1230 

26 

86 

90 

32.567 

15.671 

18.016 

1329 

1062 

938 

63 

165 

170 

1.71 

1.45 

1.44 

21.601 

12.997 

14.724 

1173 

878 

803 

108 

241 

242 

2.58 

1.75 

1.76 

18.430 

11.922 

13.623 

915 

732 

673 

163 

314 

312 

3.02 

1.91 

1.90 

15.819 

11.419 

12.904 

772 

620 

585 

225 

389 

379 

3.52 

1.99 

14.883 

652 

536 

490 

283 

455 

446 

3.75 


2.050 

27.2 

11.1 

12.7 

11.258 

2.02 

1 1 

13.042 

1.99 

Ackerman 3 9 

Ackerman 3 9 

Ackerman 3 9 

4093 

4093 

4093 

dft. 

lOOM 

8G 

274.154 

107.704 

116.946 

1583 

1215 

1055 

25 

84 

90 

161.029 

68.144 

73.624 

1207 

1018 

883 

57 

158 

172 

1.70 

1.58 

1.59 

114.140 

54.532 

59.421 

1072 

872 

756 

90 

231 

248 

2.42 

1.98 

1.97 

98.409 

49.224 

52.597 

842 

741 

658 

133 

301 

322 

2.79 

2.19 

2.22 

82.098 

47.475 

49.011 

725 

618 

579 

185 

375 

392 

3.34 

2.27 

2.39 

72.683 

638 

525 

513 

237 

445 

458 

3.77 


7.167 

38.3 

15.03 

16.3 

47.038 

2.29 

47.341 

2.47 

Ackerman 3 10 

8189 

8G 

501.817 

971 

91 

326.369 

774 

175 

1.54 

254.119 

686 

253 

1.97 

222.919 

606 

327 

2.25 

201.495 

550 

398 

2.49 

191.309 

500 

461 

2.62 


28.677 

17.5 

Ack3 6 

Ack3 6 

509 

509 

dft. 

20M 

1.161 

0.710 

2201 

1691 

42 

90 

0.686 

0.524 

1456 

1196 

109 

174 

1.70 

1.35 

0.526 

0.463 

1189 

926 

174 

254 

2.21 

1.53 

0.475 

0.433 

934 

774 

245 

325 

2.44 

1.64 

0.439 

780 

645 

317 

394 

2.64 

0.443 

0.502 

642 

575 

383 

377 

2.62 

1.41 


0.078 

14.9 

9.10 

0.428 1 

1.66 

Ack3 7 

Ack3 7 

1021 

1021 

dft. 

40M 

6.412 

3.023 

2117 

1612 

32 

89 

3.552 

2.180 

1427 

1157 

85 

172 

1.81 

1.39 

2.473 

1.909 

1224 

908 

143 

250 

2.59 

1.58 

2.187 

1.709 

956 

782 

207 

325 

2.93 

1.77 

1.933 

815 

646 

275 

395 

3.32 

1.835 

1.896 

686 

590 

344 

388 

3.49 


0.310 

20.7 

9.75 

1.698 1 

1 1 

1.59 

Ack3 8 

Ack3 8 

2045 

2045 

dft. 

60M 

31.762 

13.675 

2039 

1445 

27 

88 

18.775 

9.743 

1357 

1053 

68 

169 

1.69 

1.40 

12.488 

8.044 

1206 

875 

115 

247 

2.54 

1.70 

10.555 

7.291 

954 

742 

172 

321 

3.01 

1.88 

9.132 

810 

637 

235 

394 

3.48 

8.543 

7.027 

695 

529 

292 

467 

3.72 


1.109 

28.64 

12.3 

6.914 1 

1 1 

1.95 

fib 20 

6765 

IG 

0.587 

714 

94 

0.390 

602 

169 

1.50 

0.325 

527 

234 

1.80 

0.291 

470 

293 

2.02 

0.266 

428 

353 

2.21 

0.260 

401 

385 

2.26 


0.030 

19.6 

fib 25 

fib 25 

fib 25 

fib 25 

fib 25 

75025 

75025 

75025 

75025 

75025 

IG 

2G 

4G 

8G 

12G 

8.832 

7.096 

7.321 

7.538 

7.589 

838 

797 

677 

667 

684 

56 

74 

85 

86 

84 

5.531 

4.603 

4.709 

4.907 

5.023 

674 

637 

568 

563 

580 

112 

144 

160 

157 

151 

1.60 

1.54 

1.55 

1.54 

1.51 

4.250 

3.685 

3.910 

4.026 

4.218 

576 

562 

486 

501 

502 

171 

205 

228 

217 

209 

2.08 

1.93 

1.87 

1.87 

1.80 

3.754 

3.398 

3.445 

3.667 

3.804 

515 

473 

432 

442 

443 

216 

264 

291 

271 

264 

2.35 

2.09 

2.13 

2.06 

2.00 

3.373 

3.082 

3.226 

3.318 

3.428 

456 

446 

404 

401 

409 

272 

310 

333 

331 

318 

2.62 

2.30 

2.27 


3.141 

413 

402 

365 

359 

382 

323 

359 

406 

372 

362 

2.81 


0.513 

17.2 

13.8 

14.3 

14.7 

14.8 


2.958 

2.40 

2.958 

2.47 

1 1 


3.329 

2.26 

2.21 

3.226 

2.35 

fib 28 

fib 28 

fib 28 

fib 28 

317811 

317811 

317811 

317811 

2G 

4G 

8G 

12G 

89.117 

42.922 

35.092 

34.283 

834 

826 

777 

750 

25 

52 

68 

72 

54.464 

25.890 

22.117 

22.027 

682 

649 

627 

613 

51 

109 

133 

138 

1.64 

1.66 

1.59 

1.56 

54.409 

21.034 

18.213 

17.603 

532 

567 

541 

551 

63 

141 

189 

193 

1.64 

2.04 

1.93 

1.95 

48.101 

19.416 

16.366 

16.007 

493 

501 

482 

465 

79 

189 

236 

253 

1.85 

2.21 

2.144 

2.14 

35.036 

16.953 

14.749 

14.546 

452 

425 

444 

430 

115 

256 

290 

301 

1 1 

44.302 

16.310 

14.093 

406 

411 

397 

386 

101 

283 

335 

355 

2.011 


SegFault 

2.53 

2.38 

2.36 

2.63 

2.49 

13.759 

2.49 

fib 30 

832040 

12G 

116.444 

804 

53 

76.595 

624 

104 

1.52 

62.474 

559 

142 

1.86 

55.308 

495 

182 

2.11 

49.664 

454 

222 

2.34 

45.192 

406 

273 

2.58 


SegFault 

fibonacci 20 

fibonacci 25 

fibonacci 28 

fibonacci 30 

6765 

75025 

317811 

832040 

IG 

2G 

8G 

8G 

0.269 

2.740 

11.752 

47.947 

759 

742 

724 

1048 

91 

95 

90 

45 

0.185 

1.841 

7.926 

30.137 

619 

590 

583 

707 

163 

178 

167 

96 

1.45 

1.49 

1.48 

1.59 

0.154 

1.459 

5.928 

23.071 

551 

526 

508 

608 

223 

255 

258 

144 

1.75 

1.88 

1.98 

2.08 

0.141 

1.324 

5.460 

17.589 

486 

451 

446 

513 

275 

327 

321 

222 

1.91 

2.07 

2.15 

2.73 

0.130 

1.172 

4.934 

15.243 

456 

404 

404 

464 

320 

400 

395 

282 

2.07 

2.34 

2.38 

0.125 

1.134 

4.519 

16.191 

417 

377 

374 

422 

366 

461 

469 

292 

2.15 


0.018 

0.172 

0.721 

1.858 

14.9 

15.9 

16.3 

25.8 

2.42 

2.60 

1 3.15 1 

2.96 

sort200 

sort300 

sort400 

sort500 

sortC600 

sortC700 

sortC800 

sortC900 


dft. 

dft. 

dft. 

dft. 

50M 

50M 

50M 

50M 

0.123 

0.255 

0.466 

0.802 

0.570 

0.765 

1.002 

1.274 

2226 

2114 

2039 

1924 

1504 

1495 

1474 

1451 

46 

44 

41 

38 

91 

91 

91 

90 

0.092 

0.191 

0.328 

0.543 

0.439 

0.586 

0.756 

0.955 

1443 

1341 

1291 

1233 

1054 

1044 

1039 

1032 

94 

93 

92 

87 

169 

170 

170 

170 

1.34 

1.34 

1.42 

1.48 

1.30 

1.31 

1.31 

1.33 

0.085 

0.155 

0.254 

0.400 

0.367 

0.492 

0.625 

0.798 

1171 

1115 

1084 

1057 

875 

862 

869 

850 

126 

138 

142 

137 

243 

246 

246 

248 

1.45 

1.65 

1.83 

2.01 

1.55 

1.55 

1.55 

1.60 

0.080 

0.152 

0.243 

0.376 

0.349 

0.460 

0.586 

0.741 

925 

874 

851 

826 

714 

708 

714 

707 

168 

180 

190 

187 

313 

319 

320 

320 

1 1 

0.080 

0.159 

0.223 

0.345 

0.335 

0.430 

0.558 

0.698 

766 

695 

741 

709 

607 

627 

613 

620 

204 

217 

237 

238 

384 

387 

391 

393 

1 1 

0.080 

0.145 

0.218 

0.332 

0.326 

0.429 

0.557 

0.688 

662 

626 

622 

619 

534 

530 

526 

530 

236 

264 

289 

283 

449 

458 

458 

460 

1.54 


0.012 

0.023 

0.037 

0.060 

0.071 

0.095 

0.121 

0.155 

10.3 

11.1 

12.6 

13.4 

8.03 

8.05 

8.05 

8.22 

1.68 

1.92 

2.13 

1.63 

1.66 

1.66 

1.72 

1.60 

2.09 

2.32 

1.70 

1.78 

1.78 

1.83 

1.76 

2.13 

2.41 

1.75 

1.78 

1.78 

1.85 

sortClOOO 


dft. 

3.810 

1741 

31 

2.307 

1158 

77 

1.65 

1.602 

995 

128 

2.38 

1.404 

790 

184 

2.71 

1.215 

682 

247 

3.14 

1.105 

600 

309 

3.45 


0.196 

19.4 

sortClOOO 

sortC2000 

sortC3000 

sortC4000 

sortC5000 

sortClOOOO 


lOOM 

lOOM 

lOOM 

lOOM 

lOOM 

IG 

1.599 

6.805 

16.932 

32.337 

54.673 

247.709 

1410 

1292 

1174 

1108 

1037 

839 

91 

90 

88 

87 

85 

93 

1.196 

4.915 

11.656 

21.991 

36.384 

165.529 

997 

936 

892 

854 

830 

650 

173 

172 

169 

166 

161 

180 

1.34 

1.38 

1.45 

1.47 

1.50 

1.50 

0.999 

3.939 

9.261 

17.454 

28.286 

131.459 

827 

800 

768 

737 

722 

554 

249 

251 

248 

242 

238 

265 

1.60 

1.73 

1.83 

1.85 

1.93 

1.88 

0.902 

3.599 

8.344 

15.580 

25.387 

110.329 

700 

671 

654 

634 

620 

506 

326 

328 

323 

315 

308 

346 

1.77 

1.89 

2.03 

2.08 

2.15 

2.25 

0.853 

3.323 

7.674 

14.112 

22.902 

97.315 

605 

592 

579 

567 

558 

472 

399 

404 

397 

390 

380 

420 

1.87 

2.05 

2.21 

2.29 

2.38 

2.55 

0.845 

3.239 

7.430 

13.638 

21.701 

89.930s 

520 

510 

503 

497 

492 

424 

469 

480 

472 

460 

455 

506 

1.89 


0.196 

8.16 

2.10 

SlackOverfiow 

SlackOverfiow 

SlackOverfiow 

SlackOverfiow 

SlackOverfiow 

2.28 

2.37 

2.52 

2.75 


Table 3: Benchmarks for Inets programs 


Just adding cores to a Run I nets run without any heap settings (see the “dft.” rows) appears to 
produce relatively nice speed-ups for fine-grained parallelism, but one has to be aware that the single- 
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core execution in that case typically was spending a far larger portion of its time in garbage collection 
than the multi-core versions. (This applies also for “relatively small” fixed heaps.) 

It appears to be more honest to consider the speed-ups compared to single-core executions with a 
“good” fixed heap setting; the fastest runs on our six-core machine with our parallel interpreter all use 
five or six cores, and tend to take only about five to six times as long as the compiled Inets runs on a 
single core. 

(For reasons I have not investigated, the Inets-compiled executables crashed for the larger fib.inet runs after 
producing partial output; on a modified version (fibNat.inet) that converts results from unary representation to int 
attributes, all Inets runs crashed. For sort.inet, the Inets version was originally changed only by adding longer 
argument lists to the start net; beyond 500 elements, this lead to stack overflow errors in the javacc-generated 
parser. Changing the start net definition to a sequence of equations each adding a smaller chunk to the list allowed 
us to make some progress, but beyond 1000 elements, a different stack overflow occurred.) 


6 Conclusion 

Interaction nets as an “inherently parallel” execution model promise large speed-ups via parallelisation, 
but accessible platforms for experimentation are still missing. 

Using Concurrent Haskell to implement interaction nets understood as an execution mechanism, we 
achieved a simple and easily understandable implementation, the entire core of which could be presented 
in just a bit more than three pages of literate code. By having added support for the Inets file format, 
we enable experimentation with interaction net definitions in the shape used by most of the current 
interaction net literature — with the restriction that a consistent polarity assignment must be possible 
(which is also one of the conditions of Lafont llLaf90ll for deadlock safety). 

Keeping in mind that, in our straight-forward ultrafine-grained implementation, the concurrent inter¬ 
action net rules reduce a heavily shared structure, and given that we made no effort to enable coarse-grain 
parallelism, the speed-ups achieved on the usual microbenchmarks are actually surprisingly good, and 
we expect even better behaviour on rules with larger right-hand sides that give rise to more sparsely 
connected nets. 
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